Bioinformatics analyses of potential ACLF biological mechanisms and identification of immune-related hub genes and vital miRNAs

Acute-on-chronic liver failure (ACLF) is a critical and refractory disease and a hepatic disorder accompanied by immune dysfunction. Thus, it is essential to explore key immune-related genes of ACLF and investigate its mechanisms. We used two public datasets (GSE142255 and GSE168048) to perform various bioinformatics analyses, including WGCNA, CIBERSORT, and GSEA. We also constructed an ACLF immune-related protein–protein interaction (PPI) network to obtain hub differentially expressed genes (DEGs) and predict corresponding miRNAs. Finally, an ACLF rat model was established to verify the results. A total of 388 DEGs were identified in ACLF, including 162 upregulated and 226 downregulated genes. The enrichment analyses revealed that these DEGs were mainly involved in inflammatory-immune responses and biosynthetic metabolic pathways. Twenty-eight gene modules were obtained using WGCNA and the coral1 and darkseagreen4 modules were highly correlated with M1 macrophage polarization. As a result, 10 hub genes and 2 miRNAs were identified to be significantly altered in ACLF. The bioinformatics analyses of the two datasets presented valuable insights into the pathogenesis and screening of hub genes of ACLF. These results might contribute to a better understanding of the potential molecular mechanisms of ACLF. Finally, further studies are required to validate our current findings.

Differentially expressed genes (DEGs). The identification of DEGs was performed using the "limma" R package provided by Sangerbox tools, a free online platform for data analyses (http:// www. sange rbox. com/ tool). The p-values and adjusted p-values were calculated using t-tests. Genes in each sample that met |log 2 (fold change) |> 1 and adjusted p < 0.05 were retained. Enrichment analyses. Gene Set Enrichment Analysis (GSEA) is a computational method used to determine whether a pre-defined gene set can exhibit consistent significant differences in two biological states 9 . The c2.cp. kegg.v7.4.symbols.gmt was chosen as a control to assess ACLF-related pathways. Based on gene expression profiles and phenotypic groupings, a minimum gene set of 5 and a maximum gene set of 5000 was selected. One thousand resamplings were performed and a p-value < 0.05 and an FDR < 0.25 were considered statistically significant. Then, to explore which pathways were involved in ACLF pathogenesis, the complete gene expression matrix information of GSE142255 was uploaded to GSEA, and the enrichment results satisfying |NES|≥ 1.0 and NOM p < 0.05 were considered statistically significant. The Metascape 10 database and KEGG database (https:// www. kegg. jp/ kegg/ kegg1. html) were used to perform Gene Ontology (GO) and KEGG pathway enrichment analyses by submitting specific gene lists, including upregulated and downregulated DEGs, with min overlap = 3, min enrichment = 1.5, and p < 0.01. Besides, biological processes (BP) networks of DEGs were analyzed and visualized using Cluego in Cytoscape 11 . The kappa score was set as 0.4. GO terms with at least 9 gene hits and gene percentages > 9% were selected and a p < 0.05 was considered statistically significant.
Immune cell infiltration analysis and WGCNA. CIBERSORT is an algorithm for estimating cellular compositions in tissues based on normalized gene expression data 12 . In the current study, whole blood mRNA expression profile data from ACLF patients and HS were extracted from GSE142255. CIBERSORT was used to assess the relative proportions of 22 immune cells. The CIBERSORT method was chosen to calculate the immuno-infiltrating cell scores for each sample based on the expression profiles obtained using the IOBR R package. To identify gene co-expression modules, the top 15,000 genes with the most significant differences between ACLF patients and HS were used to construct a co-expression network via the WGCNA package provided by Sangerbox tools 13 . The proper power was calculated based on the pickSoftThreshold function. Gene profiles were considered as unsigned co-expression networks. The execution parameters were: power = 14, blockSize = 7000, minModuleSize = 20, deepSplit = 2, mergeCutHeight = 0.25, hub cut = 0.9, net threshold = 0. www.nature.com/scientificreports/ The gene co-expression modules obtained were further used for association analysis with primarily infiltrating immune cells indicated by CIBERSORT. The correlations between modules and immune infiltrative features were analyzed and presented as heat maps. The grey module represents a collection of genes that could not be assigned to any module. Gene Significance (GS) represents the association between gene expression and each feature, while Module Membership (MM) is the correlation of gene expression with each module. The correlation between MM and GS was calculated using Pearson correlation analysis. The identification of gene clus-  Laboratory Animal Technology Co., Ltd., Beijing, China. After one week of acclimatization, rats were randomly divided into two groups: normal control (NC, n = 9) and ACLF (ACLF, n = 9). The ACLF rat model was established as previously described. Briefly, rats were intraperitoneally injected with 40% carbon tetrachloride (CCl 4 ) olive oil solution (1.5 ml/kg) twice a week for ten weeks and an acute liver injury was induced by intraperitoneal injection of LPS (100 μg/kg) and D-GalN (400 mg/kg). After 12 h, all rats were sacrificed and the livers were collected and stored at − 80 ℃. Histological observation.
To make tissue sections, rat liver tissues were first embedded in paraformaldehyde, then in paraffin. Processed tissue sections were stained with hematoxylin-eosin (HE) and Masson.
Quantitative real-time PCR (qRT-PCR) analysis. Briefly, the RNA was extracted from livers using Trizol and 50 µL of RNase-free water was added to lyse the RNA. The RNA was reverse transcribed to cDNA by adding primers synthesized by the Beijing Tianyi Huiyuan Biotechnology Co., LTD. Then, the qRT-PCR reaction was performed using the TB Green® Premix Ex Taq™ II (Tli RNaseH Plus) kit mixed with cDNA. The reaction conditions comprehended 40 cycles of 95 °C for 10 s, and 60 °C for 30 s after pre-denaturation for 5 min. The 2 −ΔΔCt method was used to analyze the expression of mRNAs and miRNAs. The mRNA expression was normalized to β -actin, while miRNA was normalized to U6. The primer sequences are shown in Table 2.
Ethical approval. All animal experiments were performed according to ARRIVE guidelines and carried out in the Experimental Animal Center of Capital Medical University. All protocols were carried out in accordance with current legislation relating to animals and experiments involving animals and were approved by the Animal Experiments and Experimental Animal Welfare Committee of Capital Medical University (Beijing, China).

Results
Screening of DEGs. Two microarray datasets (GSE12255 and GSE168048) were used to screen DEGs with the following criteria: adjusted p-value < 0.05 and |log 2 FC|> 1. A total of 388 DEGs (226 downregulated and 162 upregulated) were identified after removing unannotated probes and duplicated genes in GSE142255 ( Fig. 2A,C, and Supplementary Table S1). Meanwhile, 386 DEGs were upregulated and 223 were downregulated in the GSE168048 dataset (Fig. 2B,D, and Supplementary Table S2).
Pathway enrichment analysis and functional annotation of GSE142255. All annotated gene information of ACLF and HS in GSE142255 was uploaded to the GSEA software for analysis at a holistic level. Pathways with |NES|≥ 1.0 and NOM p-value < 0.05 were considered as significantly enriched gene sets. The GSEA results revealed significant enrichment in immune-related functions and biosynthesis metabolic pathways (Fig. 3), including primary immunodeficiency, and fructose and mannose metabolism. Using Metascape, 162 upregulated and 226 downregulated DEGs were enriched in GSE142255 according to the KEGG pathway analysis (Supplementary Table S3). Downregulated DEGs were mainly engaged in multiple immune responses and inflammatory signaling pathways, including Th1 and Th2 cell differentiation, T cell receptor signaling pathway, Natural killer cell-mediated cytotoxicity, Chemokine signaling pathway, NF-kappa B signaling pathway, and Apoptosis. Meanwhile, upregulated DEGs affected biosynthetic and substance metabolism pathways, including Starch and sucrose metabolism, MAPK signaling pathway, Fatty acid biosynthesis, Insulin signaling, Cytokinecytokine receptor interaction, Fructose and mannose metabolism, and Central carbon metabolism in cancer

Correlation analysis of immune infiltration and gene modules.
As reported in GSE142255, ACLF patients are characterized by dysregulated blood immune cells, such as neutrophils, and several lymphocyte subsets. Herein, we used the CIBERSORT algorithm to investigate the correlation between immune infiltration and gene co-expression modules. The proportion of each immune cell type in each sample is shown in a stacked bar graph (Fig. 7A). The proportions of M0 and M1 macrophages increased, while the proportion of monocytes was not significantly altered, indicating significant macrophage differentiation and polarization in ACLF (Fig. 7B). Moreover, the coral1 module had significant negative correlations with M0 (r =− 0.63, p < 0.01) and M1 (r =− 0.65, p < 0.01) macrophages. On the other hand, the darkseagreen4 module had a significant positive correlation with M0 (r = 0.7, p < 0.01) and M1 (r = 0.67, p < 0.01) macrophages (Fig. 8).

Construction and analysis of a PPI network of macrophage-related DEGs.
To screen out macrophage-related DEGs, 297 genes were obtained from the intersection of the coral1 and darkseagreen4 modules  www.nature.com/scientificreports/ and DEGs (Fig. 9A). Then, the PPI network was constructed, resulting in 168 nodes, 469 edges, and 7 gene clusters (Fig. 9B,C). The GO annotation and KEGG pathway analyses of the PPI network are shown in Fig. 10 and Supplementary Table S4.
Identification of hub genes. Based on the PPI network, we used ten topological analysis methods provided by CytoHubba to calculate and rank the parameters of each node. Then, ten hub genes were obtained: www.nature.com/scientificreports/ RSL1D1, RPS5, CCL5, HSPA8, PRKCQ, MMP9, ITGAM, LCK, IL7R, and HP (Fig. 11). Further, we explored the genes on GSE168048. We found that the expression of RPS5, PRKCQ, MMP9, LCK, ITGAM, IL7R, and CCL5 was statistically different between surviving and dead samples (Fig. 12A). The expression remained consistent in both datasets, suggesting that these genes are associated with the 28-day survival status of ACLF patients and might be critical in ACLF.
Prediction of miRNAs related to hub genes. The ten hub genes were submitted to miRNet 2.0 for miRNA prediction and network construction. The miRNAs with degree values ≥ 3 are shown in Fig. 12B. The degree values of Has-mir-1-3p, has-mir-9-5p, has-mir-16-5p, has-mir-182-5p, and has-mir-26a-5p were equal to 4. Hence, they might be crucial miRNAs during ACLF development.

Experimental verification in vivo.
We assessed the pathological damage in ACLF model rats by HE, masson staining. The liver lobules of the rats in NC had a regular structure, and there were no necrotic cells, without obvious inflammatory cell infiltration or fibrous hyperplasia. However, the hepatocytes in the ACLF group were disorganized, with large and sub-large pieces of severe necrosis. A large infiltration of inflammatory cells is seen in the visual field, as well as marked hepatic sinusoidal haemorrhage. The hilar region is bridged by proliferating collagen fibres and pseudobullets are formed (Fig. 13A). Furthermore, liver sections were double-stained for iba1 (macrophages marker) and CD68 (M1 marker) by immunofluorescence 16,17 . A significant increase in iba1 and CD68 positive staining cells was detected in the livers of ACLF rats, suggesting that ACLF macrophages were recruited and induced to M1 polarization (Fig. 13B,C). Consistent with the results predicted by CIBERSORT, a significant macrophage infiltration occurred in ACLF rats. Subsequently, we used qRT-PCR to examine the expression levels of hub genes and miRNAs in ACLF rats compared to controls. The results showed a significant decrease in the mRNA expression of RSL1D1, RPS5, CCL5, HSPA8, PRKCQ, LCK, and HP (p < 0.05, p < 0.01, p < 0.001), while MMP9, ITGAM, and IL7R were significantly increased (p < 0.05, p < 0.01) (Fig. 13D). Then, to validate our predicted miRNAs, the miRNAs homologous to humans were matched using the miRBase database (https:// mirba se. org/) and analyzed via qRT-PCR. The expression of mir-16-5p increased (p < 0.05) and mir-26a-5p decreased in ACLF rats (p < 0.05). In contrast, the expression of mir-182 and mir-9a-5p did not differ from controls (Fig. 13E).

Discussion
ACLF is a systemic inflammatory disease accompanied by immune dysfunction and disturbances in energy metabolism. ACLF has a high short-term mortality, which increases with the incidence of failing organs. Although many studies regarding ACLF have been performed, its underlying mechanisms remain to be fully www.nature.com/scientificreports/ explored. Meanwhile, it has been demonstrated that a strong immune response is a key mechanism of ACLF 18,19 . Therefore, we explored the intrinsic mechanisms of immune cell infiltration in ACLF using an effective informational biology approach.
Herein, we identified macrophage-associated co-expressed gene modules in ACLF for the first time using a combination of WGCNA and CIBERSORT. We identified immune-related key genes and provided new pathways for future studies on effective targets for ACLF treatments. After bioinformatics and qRT-PCR experiments, 10 immune-related hub genes were identified and mir-16-5p and mir-26a-5p were validated. Altogether, these results might provide new strategies for understanding the pathogenesis of ACLF and developing targeted therapeutic molecules.
In the present study, we evaluated potential pathways and biological processes of ACLF using enrichment analyses. GSEA is characterized by the analysis of collections of genes rather than individual genes, which helps www.nature.com/scientificreports/ to avoid the inability to reproduce individual high-scoring genes due to poor annotation. In the GSE142255 dataset, the GSEA indicated that immune response, inflammatory pathways, and metabolic pathways were mainly involved in ACLF. Then, we found that the downregulated DEGs were mainly engaged in immune response and inflammatory reaction, while upregulated ones regulated biosynthetic and substance metabolism pathways. These results reflected two major biological processes that co-occurred during the progression of ACLF regulated by different genes: imbalance of immune-inflammatory response and energy metabolism. According to the BP analysis, immune cell activation, differentiation, proliferation, and migration were the major biological processes in ACLF, leading to an expanding inflammatory response. Recently, it was reported that excessive activation of the immune response not only causes a systemic inflammatory response, which subsequently mediates immunerelated tissue damage, but also leads to high energy demand. Consequently, the immune system competes with www.nature.com/scientificreports/ peripheral organs for energy, triggering an immune-related energy crisis in the organism and increasing the risk of organ failure 18,20,21 . Overall, it was suggested that the hyperimmune response and dysfunctional energy metabolism in ACLF are biologically coupled processes, largely influencing ACLF progress. CIBERSORT is a widely used deconvolution machine algorithm for estimating the composition of immune cells. It shows superior performance in the identification and fine delineation of immune cells when processing highly noisy mixture data 8,22 . Here, the CIBERSORT results showed that the population of M0 and M1 macrophages was significantly increased in ACLF patients compared to healthy subjects. We also labeled markers on the surface of macrophages by immunofluorescence and validated their increase in M1 macrophages in the liver of ACLF rats. Macrophages can be polarized into M1 or M2 phenotypes. M1 macrophages can release significant influxes of inflammatory factors and induce cytokine storms with pro-inflammatory effects. On the other hand, M2 macrophages secrete tissue repair factors and exhibit anti-inflammatory and reparative properties 23,24 . Kupffer cells, a type of macrophage that resides in the hepatic sinusoids, mainly perform innate immune and inflammatory responses 25 . To search for highly related gene modules, WGCNA identifies similar gene clusters and gene modules by hierarchical clustering. WGCNA also supports the analysis of correlations between gene modules and phenotypic traits 26,27 . To identify gene clusters associated with macrophages, we performed WGCNA and identified gene modules closely related to M1 macrophage polarization, including the coral1 (containing 3631 genes) and darkseagreen4 (containing 307 genes) modules. Based on the WGCNA's gene modules, we screened immune-related DEGs and constructed a PPI network to find ACLF immune-related hub genes.
Ten hub genes were screened using CytoHubba: RSL1D1, RPS5, CCL5, HSPA8, PRKCQ, MMP9, ITGAM, LCK, IL7R, and HP ( Table 3). The differential expression of hub genes was further confirmed by qRT-PCR in ACLF rats. Overall, MMP9, ITGAM, and IL7R were highly expressed during ACLF. Furthermore, ACLF has high 28-day mortality that is closely related to the degree of organ failure in patients. Hence, we used the GSE168048 microarray containing gene expression data of ACLF patients who survived or died at 28 days for further investigation. We verified that the expression of RPS5, PRKCQ, MMP9, LCK, ITGAM, IL7R, and CCL5 www.nature.com/scientificreports/ differed between surviving and deceased patients, suggesting that these genes might be closely related to ACLF progression and could be used to predict ACLF survival status at 28 days. Notably, downregulated genes were mostly involved in the promotion of immune response, while the upregulated gene, MMP9, was associated with hepatocyte necrosis. These results suggested that the coexistence of immune paralysis and cell necrosis is a potential ACLF mechanism leading to poor prognosis. Moreover, miRNAs are potential targets in numerous diseases and control various biological processes. As short-chain RNAs with a coding length of only about 22 nucleotides, miRNAs cannot directly be translated into proteins, but rather regulate protein synthesis by disrupting the stability of target mRNAs and inhibiting their translation through complementary pairing 28 . Studies have explored the relationship between miRNAs and diseases and proposed the use of miRNAs as a biomarker for disease diagnosis and prognosis as well as a small molecule drug target 29 . Considering the time and cost of experimental studies, we adopted a database approach combined with experimental validation to study miRNAs that were significantly altered in ACLF. The miRNet 2.0 integrates data from 15 prediction databases and provides visual analytics to enable a more comprehensive and convenient evaluation of the interactions between miRNAs, mRNAs, lncRNAs, and transcription factors 15 . Herein, we used miRNet 2.0 to construct a miRNA-hub genes network to explore potential miRNAs related to ACLF. During the validation, two miRNAs were significantly altered in ACLF rats: mir-16-5p presented increased expression and mir-26a-5p showed decreased expression. M1 macrophages can transfer mir-16-5p to gastric cancer (GC) cells by secreting exosomes and triggering a T-cell immune response to suppress tumor formation by decreasing the expression of PD-L1 30 . It has been demonstrated that mir-26a-5p decreases with ACLF progression and is associated with worsening liver function and increasing liver disease severity 31 . However, further studies are needed to validate the potential association between miRNA regulatory networks and ACLF. www.nature.com/scientificreports/ Predicting potential disease-associated miRNAs is very meaningful and challenging. Thus, researchers have developed several computational methods and models to perform those predictions. These models can www.nature.com/scientificreports/ be classified into four categories: score functions, complex network algorithms, machine learning, and multiple biological information 29 . For example, Chen et al. 32 proposed an inductive matrix filling model (IMCMDA) for miRNA-disease association prediction. By integrating miRNA and disease similarity information into the matrix-populated objective function, a low-dimensional representation matrix of miRNAs and diseases was obtained, which was finally combined into a miRNA-disease association score matrix. Chen et al. 33 improved the HGIMDA model and further provided the MDHGI model. This model first decomposes the miRNA-disease  www.nature.com/scientificreports/ predicting potential miRNA-disease associations through the random forest. This model combines feature and deep forest-integrated learning models to enhance the prediction accuracy. Bioinformatics-based prediction methods are constantly evolving. Nevertheless, different models have almost different predictive performance for the same datasets. Hence, it is not only necessary to collect large-scale experimental data but also consider other algorithms to improve predictive performance for specific diseases.
Besides the methods covered in this study, the multi-field predictive research of bioinformatics offers a unique perspective on the exploration of diagnostic and therapeutic tools for diseases, not only for ACLF. Currently, with the development of genome-wide technologies, there is an increasing need to explore models that detail the exact mechanisms in which genes and proteins interact to form complex living systems. A gene regulatory network (GRN) is a network of interactions between gene molecules. An improved Markov blanket discovery algorithm based on IMBDANET has been proposed and can effectively distinguish between direct and indirect regulatory genes from GN and reduce the false-positive rate in the network inference process 36 . Additionally, www.nature.com/scientificreports/ RWRNET is an algorithm of Random Walk with Restart (RWR) modified by restart probability, initial probability vector, and roaming network applied to GRN that continuously maps the global topology of the network and estimates the affinity between nodes in the network through circular iterations until all nodes are traversed 37 . In contrast, IMBDANET uses a Markov blanket discovery algorithm for network topology analysis and processing, identifying direct and indirect regulatory genes while solving the problem of isolated nodes. On the other   www.nature.com/scientificreports/ Here, we combined WGCNA and CIBERSORT algorithms and employed GSEA, KEGG, and GO enrichment analyses to explore immune-related hub genes and potential biological mechanisms in ACLF. The hub genes and miRNAs involved in ACLF regulation were also further validated. Since there are few studies regarding ACLF mechanisms, adopting bioinformatics analyses provided valid information and guidance for our research. However, our current study also has some limitations. First, we used an animal model rather than samples from humans to validate the ACLF immune-related hub genes, and the results from animal studies should be treated with caution. Furthermore, although these hub genes and miRNAs were altered and might be involved in the development of ACLF, whether these genes can be new therapeutic targets for ACLF still needs to be explored. Therefore, further experiments are required to validate our findings and explore potential ACLF mechanisms.

Conclusion
In summary, we used different bioinformatics approaches to uncover potential ACLF molecular mechanisms. The biological processes and pathways that govern immune activation provided a meaningful insight for studying ACLF pathogenesis. Immune-related hub genes and important miRNAs that might be involved in critical ACLF functions were also identified. Finally, further studies are needed to identify the molecular mechanisms of these key genes, which might also contribute to the understanding of ACLF.

Data availability
The datasets (GSE142255 and GSE168048) analysed during the current study are available in the Gene Expression Omnibus (GEO) repository(http:// www. ncbi. nlm. nih. gov/ geo/).